Very Large Lexical Databases: An ACL Tutorial
نویسندگان
چکیده
The WWW is two orders of magnitude larger than the largest corpora. Although noisy, web textpresents language as it is used, and statistics derived from the Web can have practical uses in many NLPapplications. For this reason, the WWW should be seen and studied as any other computationally availablelinguistic resource. In this article, we illustrate this by showing that an Example−Based approach to lexicalchoice for machine translation can use the Web as an adequate and free resource.
منابع مشابه
Parsing in the Absence of a Complete Lexicon
It is impractical for natural language parsers which serve as front ends to large or changing databases to maintain a complete in-core lexicon of words and meanings. This note discusses a practical approach to using alternative sources of lexical knowledge by postponing word categorization decisions until the parse is complete, and resolving remaining lexical anthiguities usiug a variety of inf...
متن کاملDeriving Verbal and Compositional Lexical Aspect for NLP Applications
Verbal and compositional lexical aspect provide the underlying temporal structure of events. Knowledge of lexical aspect, e.g., (a)telicity, is therefore required for interpreting event sequences in discourse (Dowty, 1986; Moens and Steedman, 1988; Passoneau, 1988), interfacing to temporal databases (Androutsopoulos, 1996), processing temporal modifiers (Antonisse, 1994), describing allowable a...
متن کاملMultilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources
This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the ‘...
متن کاملAligning WordNet with Additional Lexical Resources
This paper explores the relationship between WordNet and other conventional linguistically-based lexical resources. We introduce an algorithm for aligning word senses from different resources, and use it in our exper~nent to sketch the role played by WordNet, as far as sense discrimination is concerned, when put in the context of other lexical databases. The results show how and where the resou...
متن کاملMethods for the Qualitative Evaluation of Lexical Association Measures
This paper presents methods for a qualitative, unbiased comparison of lexical association measures and the results we have obtained for adjective-noun pairs and preposition-noun-verb triples extracted from German corpora. In our approach, we compare the entire list of candidates, sorted according to the particular measures, to a reference set of manually identified “true positives”. We also sho...
متن کامل